New middleware doubles GPU computational efficiency for AI workloads in trials, says Fujitsu

“The rapid expansion of compute infrastructure to support training for genAI has created a major electrical power availability challenge,” said a Gartner research note on emerging technologies for energy-efficient generative AI compute systems by researchers Gaurav Gupta, Menglin Cao, Alan Priestley, Akhil Singh, and Joseph Unsworth.

This means those running AI data centers must find solutions to the problem now to mitigate the challenges for their operations, which include increased costs, insufficient power availability, and poorer sustainability performance.  “All of these will be eventually passed on to data center operators’ customers and end users,” the researchers noted.

At the same time, data centers must balance the bottlenecks in performance that the drive to GPU-assisted AI is causing, noted Eckhardt Fischer, senior research analyst for IDC. “Any improvement in the computer system to reduce this bottleneck will generally show a corresponding improvement in output,” he observed.

These bottlenecks for AI/genAI compute requirements include memory and networking, because “even the current Moore’s Law can’t keep up with explosive compute needs,” noted Gartner’s Gupta.

Optimizing resource allocation

Fujitsu’s AI computing broker middleware aims to solve this in part using a combination of adaptive GPU allocator technology developed by the company in November 2023, and AI-processing optimization technologies, the company said. This allows the middleware to automatically identify and optimize CPU and GPU resource allocation for AI processing in multiple programs, giving priority to processes with high execution efficiency.

However, rather than conventional resource allocation, which does the task on a per-job basis, Fujitsu’s AI computing broker dynamically allocates resources on a per-GPU basis, the company said. This is aimed at improving availability rates and allowing for the concurrent running of numerous AI processes without being concerned with GPU memory usage or physical capacity.



Source link